1 Required libraries

The required libraries are loaded - RomicsProcessor written by Geremy Clair (2021) is used to perform trackable transformation and statistics to the dataset

library("RomicsProcessor")
library("DT") #for the rendering of the enrichment tables 
library("proteinminion") #this package was created by Geremy Clair (2021) to download UniProt protein details

2 Fasta and protein ontologies download using ‘Protein Mini-On’

Using the package ‘Protein Mini-on’ (Geremy Clair 2021, in prep.), The fasta file was downloaded from Unipro for the human and bovine proteome on the Jun 15th, 2021

if(!file.exists("./03 - Output files/Uniprot_Bos_taurus_proteome_UP000009136_2020_06_15.fasta")){
  download_UniProtFasta(proteomeID = "UP000009136",reviewed = F,export = TRUE, file="./03 - Output files/Uniprot_Bos_taurus_proteome_UP000009136_2020_06_15.fasta")
}

if(!file.exists("./03 - Output files/Uniprot_Homo_sapiens_proteome_UP000005640_2020_06_15.fasta")){
    download_UniProtFasta(proteomeID = "UP000005640",reviewed = F,export = TRUE, file="./03 - Output files/Uniprot_Homo_sapiens_proteome_UP000005640_2020_06_15.fasta")
}

3 MaxQuant import

The iBAQ data contained in the protein table was loaded, the corresponding metadata was loaded

data<-data.frame(extractMaxQuant("./01 - Source files/proteinGroups.txt",quantification_type = "iBAQ",cont.rm = T,site.rm = T,rev.rm = T))
## [1] "141  Proteins were removed (protein(s) only identified by site,contaminant(s),reverse hit(s))"
## [1] "iBAQ quantification was used"
IDsdetails<-extractMaxQuantIDs("./01 - Source files/proteinGroups.txt",cont.rm = T,site.rm = T,rev.rm = T)
## [1] "141  Proteins were removed (protein(s) only identified by site,contaminant(s),reverse hit(s))"
IDsdetails<-cbind(UniProt_Name=sub(".*\\|","",IDsdetails$protein.ids), IDsdetails)
colnames(data)<- sub("iBAQ.","",colnames(data))
metadata<- read.csv(file = "./01 - Source files/metadata.csv")
colnames(metadata)<-tolower(colnames(metadata))
write.csv(extractMaxQuantIDs("./01 - Source files/proteinGroups.txt",cont.rm = T,site.rm = T,rev.rm = T),"./03 - Output files/MaxQuantIDS.csv")
## [1] "141  Proteins were removed (protein(s) only identified by site,contaminant(s),reverse hit(s))"

4 Romics_object creation

The data and metadata were placed in an romics_object, the sample names were retrieved from the metadata, the condition will be use for the coloring of the Figure and statistics

romics_proteins<- romicsCreateObject(data,metadata,main_factor = "Condition")
romics_proteins<- romicsSampleNameFromFactor(romics_proteins,factor = "sample_names")

5 Full data analysis

5.1 Data cleaning and normalization

The missingness was evaluated for each channel/sample

romics_proteins<- romicsZeroToMissing(romics_proteins)
romicsPlotMissing(romics_proteins)

The proteins to be conserved for quantification were selected to contain at least 70% of complete values (3/4 samples) for a given condition, the overall missingness was evaluated after filtering.

romics_proteins<-romicsFilterMissing(romics_proteins,percentage_completeness = 75)
## [1] "28 rows were removed for the data"
## [1] "Based on the minimum completeness set at 75%"
## [1] "at least the following number of sample(s) containing data was required:"
##        EC_mono      EVT_EC_CO EVT_EC_DSC_TRI       EVT_mono 
##              3              3              3              3
print(paste0(nrow(romics_proteins$data),"/", nrow(romics_proteins$original_data)," proteins remained after filtering", " (",round(nrow(romics_proteins$data)/nrow(romics_proteins$original_data)*100,2),"%)."))
## [1] "484/512 proteins remained after filtering (94.53%)."
romicsPlotMissing(romics_proteins)

The data was log2 transformed, the distriution boxplot were then plotted

romics_proteins<-log2transform(romics_proteins)
distribBoxplot(romics_proteins)

As the same quantity of protein was labelled for each sample, the expectation is that the distribution of the protein abundance is centered, therefore a median centering was performed prior to plot again the distribution boxplots.

romics_proteins<-medianCenterSample(romics_proteins)
distribBoxplot(romics_proteins)

The grouping of the samples by is checked by hierarchical clustering

romicsHclust(romics_proteins)

5.2 Data imputation

For some of the subsequent statistics imputations are required, we performed an imputation by assuming that the “non-detected” proteins were either low abundance or missing using the method developped by Tyranova et al. (PMID: 27348712). The gray distribution is the data distribution, the yellow distribution is the one for the random values used for imputation.

imputeMissingEval(romics_proteins,nb_stdev = 2,width_stdev = 0.5, bin=1)

romics_proteins<-imputeMissing(romics_proteins,nb_stdev = 2,width_stdev = 0.5)

The PCA grouping was checked again after imputation

indPCAplot(romics_proteins, plotType = "percentage")

indPCAplot(romics_proteins, plotType = "individual",Xcomp=1,Ycomp =2)

indPCAplot(romics_proteins,  plotType = "individual",Xcomp=2,Ycomp =3)

indPCA3D(romics_proteins)

5.3 Statistics

The means and stdev are calculated for each group

romics_proteins<-romicsMean(romics_proteins)
## [1] "The Statistics layer was added to your object"
## [1] "Means columns (*_mean) were added to the statistics"
romics_proteins<-romicsSd(romics_proteins)
## [1] "The standard deviation columns (*_sd) were added to the statistics"

Some general statistics are performed (ANOVA, T.tests).

romics_proteins<-romicsANOVA(romics_proteins)
## [1] "The ANOVA columns (ANOVA_p and ANOVA_padj) were added to the statistics"
romics_proteins<-romicsTtest(romics_proteins,var.equal = T)
## [1] "T_test columns were added to the statistics"
print(paste0(sum(romics_proteins$statistics$ANOVA_p<0.05), " proteins had an ANOVA p<0.05."))
## [1] "189 proteins had an ANOVA p<0.05."

A heatmap depicting the proteins passing an ANOVA p<0.05 is plotted, the clusters obtained were saved in the statistics.

romicsHeatmap(romics_proteins,variable_hclust_number = 4,ANOVA_filter = "p", p=0.05,sample_hclust = F)

romics_proteins<-romicsVariableHclust(romics_proteins,clusters = 4,ANOVA_filter = "p",p= 0.05,plot = F)
## [1] "The columns hclust_clusters was added to the statistics"
romics_proteins<-romicsZscores(romics_proteins)
## [1] "Z_score_ columns were added to the statistics"

The data was exported

results<-romicsExportData(romics_proteins,statistics = T,missing_data = T)
write.csv(results, "./03 - Output files/implantation_proteomics_complete_results.csv")